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^ ■ Abstract 
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, Higher-order accurate density estimation under random right censorship is achieved 

f-H I using kernel estimators from a family of infinite-order kernels. A compatible bandwidth 

C/^ ■ selection procedure is also proposed that automatically adapts to level of smoothness of 

r~| , the underlying lifetime density. The combination of infinite-order kernels with the new 

I bandwidth selection procedure produces a considerably improved estimate of the life- 

^ ■ time density and hazard function surpassing the performance of competing estimators. 

'— Infinite-order estimators are also utilized in a secondary manner as pilot estimators in 

I the plug-in approach for bandwidth choice in second-order kernels. Simulations illus- 

J> ' trate the improved accuracy of the proposed estimator against other nonparametric 

■ estimators of the density and hazard function. 
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^ '. 1 Introduction 

> ■ 

^ . In kernel-based density estimation of uncensored iid data, the usefulness of using 

^ i infinite-order kernels, or "superkernels" , is well known; cf. 0, EtI . [l^ . In general, 

■ ■ ■ ' using superkernels reduces bias by orders of magnitude without increasing the order 

of magnitude of the variance thus producing estimates with better mean square error 
(MSE) properties. Therefore one would imagine higher-order, if not infinite-order, ker- 
nels to be much more popular than second-order kernels; this, however, has not been 
the case mainly for three reasons that affect both iid density estimation and density 
estimation of censored data. 

The foremost concern with using high-order kernels is the possibility of the esti- 
mate being negative at some places when it is known that densities are nonnegative 
everywhere. The simple fix to this issue by truncating all negative values to zero works 
well but produces a secondary issue of having an estimate of the pdf that integrates to 
a value that is less than one. Indeed, the area can be renormalized back to one and still 
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have the same MSE order. Therefore the complaint that higher-order kernels produce 
density estimates that are not densities themselves is inconsequential since any lack of 
nonnegativity can easily be remedied. 

Synonymous with the problem of choosing an appropriate kernel in kernel density 
estimation is the problem of choosing the correct bandwidth. Because second-order 
kernels have been so popular, several bandwidth selection procedures have been pro- 
posed and analyzed for these kernels; refer to 0] for a review of several methods. 
However, most of the bandwidth procedures for second-order kernels do not carry over 
to infinite-order kernels, and the methods that do carry over, like cross validation, are 
known to have poor performance properties 0|. Yet in 2001, Politis 13] showed how a 
very simple and intuitive bandwidth selection algorithm works well with infinite-order 
kernels. 

Another concern about using infinite-order kernels is not the asymptotic perfor- 
mance which is guaranteed, but rather their finite sample performance. Specifically, 
in using the high-order kernels, the bias improves at a cost to increasing the variance 
by some constant factor independent of the sample size. Indeed, there are many poor 
choices of infinite-order kernels, just as there are many poor choices of second-order 
kernels. One of the simplest and most popular infinite-order kernels is the sine function 
which is a very poor choice due to its large and slowly decaying side lobes. However, 
a class of favorably performing infinite-order kernels has been proposed in [3] and 
has been shown to outperform mainstream second-order kernels in finite sample sim- 
ulations. In addition to being infinite-order, it is also advisable to have a kernel with 
tails that die off fast; this was also noted by Devroye in Requiring the kernel to 
have tails that die off quickly is equivalent to the kernels Fourier transform being very 
smooth. So the reason the sine function is such a poor choice of kernel is because its 
Fourier transform is a rectangle — the most unsmooth fiat-top shape. Improvements 
on the rectangle shape include the trapezoid and the infinitely differentiable fiat-top 
function of McMurry and Politis HI] . This paper adopts the infinite-order kernel that 
is derived from the trapezoidal shape as it is a simple choice of kernels that works well 
in practice. 

As a simple example to illustrate the effectiveness of the proposed density estima- 
tor, we present the results of a simple simulation with uncensored iid data. In the 
simulation, we estimate the pdf of a M{0, 1) distribution with datasets of sizes n = 50 
and n = 500 and two estimators-the infinite-order estimator with bandwidth selection 
procedure described in this paper (see Section 4 for the exact estimator used) and the 
default density estimator used in R version 4.2.1 (density) with a Gaussian kernel and 
its built-in bandwidth selection procedure. After repeating the simulations over 10,000 
realizations, we compute the mean square error at three points {x = 0, 1, 2) and on a 
equally spaced grid of 41 points in the interval [-2,2]. Here are the results: 

We see that even using a Gaussian kernel to estimate a Gaussian density is not 
as good as using the infinite-order kernels with accompanying bandwidth selection 
procedures that is proposed in Section 3. 

In the next section, we define the general class of flat-top infinite-order kernels and, 
through Theorem 1, describe how using these kernels can cause the bias of the kernel 
density estimators of censored data to become essentially negligible in certain situa- 
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= 1 


X = 


= 2 


avg on 


[-2,2] 


n 


50 


500 


50 


500 


50 


500 


50 


500 


MSEinfinite 


2.42 


.346 


1.86 


.299 


.973 


.115 


1.83 


.262 


MSEjensity 


4.37 


.755 


2.48 


.420 


.956 


.151 


2.58 


.439 



MSE values are blown up by 10'^ for easier comparison. 



Table 1: Comparison of the proposed infinite-order kernel density estimator 
with the Gaussian kernel density estimator on iid A/'(0,1) data. 



tions. Section 3 completes the proposed estimator by providing a bandwidth selection 
algorithm that automatically adapts to the unknown density at hand. In Section 4 we 
give practical suggestions for implementing the proposed estimator and provide several 
simulations exhibiting optimal performance in estimating the lifetime density and haz- 
ard function when compared with other nonparametric estimators including the muhaz 
estimator 



12l | and the logspline estimator 



2 The Flat-Top Estimators 

We lay out the notation under the context of random right censorship which can be 
generalized to also allow for left truncation; see for example 0. Let X?, . . . , X° be 
iid lifetime variables with density / and cdf F, and independently, let C/i, . . . , C/„ be 
iid censoring variables with cdf G. We observe the data Zi and Aj where 

Zi = min{X°, C/J and A, = l[xo<u,] ^ {0, 1} 

for i = 1, . . . ,n (here Ij.j represents the indicator function). We order the pairs {Zi, Aj) 
according to the Zj's and relabel them as {Xi, 6i) where Xi = Z^i-^, the i^^ order statis- 
tics of the Z's, and 5i is the indicator variable that accompanies Xi, i.e. the concomitant 
of Xi . The Kaplain- Meier estimator is the nonparametric maximum likelihood estimate 
of the survival function S{t) = 1 — F{t) given by 



'l, 0<t<Xi 



S{t) = { 



k-1 



TT , Xk-i<t<Xk, k = 2,...,n 

f=i \n-J + lJ 

0, t>Xn 



where the height of the jump of S at Xj is 



S{Xj)-S{Xj+i), j = l,...,n-l 
S{Xn), j = n. 
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The kernel estimate of / is constructed through the convolution ol F = 1 — S with a 
smooth kernel K, i.e. 



Many authors require K to be of compact support for ease of analysis, but this is 
unnecessary; see for example [22l |. Therefore we only assume K is an even function 
that integrates to one. 

It will be assumed that sufficient conditions on the the density / or kernel K are 
satisfied so that 



some sufficient conditions for ([2]) are provided in [2^ 

Following [ij], we now describe a class of infinite-order kernels constructed from the 
Fourier transform of fiat-top function. We start in the Fourier domain with a function 
K given by 

= <^ ' ' ' - (3) 
I otherwise 

where g is any continuous, square-integrable function that is bounded in absolute value 
by 1 and satisfies g{±l) = 1 (5 will typically be compactly supported, but this is not 
required). Then the infinite-order kernel corresponding to k is the Fourier transform 
of K, specifically, 

K{x) = — / K{t)e-'^'' dt. (4) 
2vr y„oo 

Let (j){t) be the characteristic function corresponding to /(x), i.e. <j){t) is the inverse 
Fourier transform of f{x) given by 

/oo 
e^'^f{x)dx. 
-00 

The following three assumptions quantifies the degree of smoothness of the density 
f{x) by the rate of decay of its characteristic function. 

Assumption A{r): There is an r > such that |0(*)l < 

Assumption B: There are positive constants d and D such that \(j){t)\ < De"'^'*'. 

Assumption C: There is a positive constant b such that (j){t) = when \t\ > b. 

Theorem 1. Suppose f{x) is the kernel estimator as defined in ([T|) with infinite-order 
kernel given by ^ and assume the variance assumption in ([2]) holds, 
(i) Suppose assumption A{r) holds. Let h ~ an~l^ with j3 = (2r + then 



sup 



bias|/(x)| =o(^n2'+i^ and MSB |/(x)| = O (^nsr+i 
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(ii) Suppose assumption B holds. Let h ^ l/(alogn) where a is a constant such that 
a > l/{2d), then 



sup 



bias 



m} 



o 



1 



and MSE 



O 



1 



(Hi) Suppose assumption C holds. Let h <l/b, then 



sup 



bias 



{/»} 



and MSE 



{/»} 



O - 



n 



Corollary 1. The hazard function H(x) = f{x)/S{x) is easily estimated by H[x) = 
f{x)/S{x), and .since S is a y/n-convergent estimator of S, this estimate of the hazard 
function has the same MSE convergence rates as f in the above theorem. Specifi- 
cally: 

(i) Under assumption A{r), MSE(i7(x)) = O (n'^^+^ ) ; 



(ii) Under assumption B, MSE(^(x)) = O (^'^^j 
(Hi) Under assumption C, MSE(i?(x)) = O (^). 

The p^^ derivative of / can be estimated by the the p*^ derivative of /(x); i.e. if 
K^P\x) is the p^^ derivative of K{x), then 



h 



(5) 



is an estimate of the p^^ derivative of /. It can be shown, under sufficient conditions 
on /, that the variance of this estimator is 



var fp{x) = O 



1 



n 



(6) 



The previous theorem is now be generahzed in the following theorem to give asymptotic 
bias and MSE rates of fp{x) with infinite-order kernels. 

Theorem 2. Suppose fp{x) is the kernel estimator as defined in ([5]) where K is an 
infinite- order kernel, and assume the variance assumption in ([6]) holds, 
(i) Suppose assumption A{r -\- p) holds. Let h ~ an^l^ with (5 = (2r + p-\- l)""*^, then 



sup 



bias 



o(n2'-+p+i) and MSE 



-2r 

O ( 



(ii) Suppose assumption B holds. Let h ^ l/(alogn) where a is a constant such that 
a > l/{2d), then 



sup 

xGR 



bias 



O 



and MSE 



{fpix)} 



O 



n 
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(in) Suppose assumption C holds. Let h < 1/6, then 



sup 



bias 



and MSE 



O 



In particular, we see that if the underlying density is infinitely smooth (as in the 
case of assumptions B and C), then the same asymptotic MSE rates of fp{x) hold for 
every p. 



3 Bandwidth Selection for Flat-Top Estimators 

Let be an estimate of the characteristic function (j){t) given by 

/oo " 

We now follow the general recipe for bandwidth selection using flat-top kernels that 
is detailed in [3]. Specifically, we determine the smallest value t* such that (f){t) 
for all t G {t*,t* + e) for some pre-specified e, then the estimate of the bandwidth is 
h = 1 /t* . The details are provided in the following algorithm. 

Bandwidth Selection Algorithm 

Let C > be a fixed constant, and e„ be a nondecreasing sequence of 
positive real numbers tending to infinity such that e„ = o(logn). Let t* be 
the smallest number such that 



< C\r^^ for ah te{t\f+ Sn) (7) 
V n 

Then let h = l/t*. 

Remark 1. The positive constant C is irrelevant in the asymptotic theory, but is 
relevant for finite-sample calculations. The main idea behind the algorithm is to deter- 
mine the smallest t such that (j){t) ~ 0; in most cases this can be visually seen without 
explicitly computing the threshold in ([Tj). 

Remark 2. If g{t) in ([3]) is identically one, or very close to one, in a neighborhood 
of the type [1, 1 + rj], then the "flat-top radius" is effectively increased to some value 
1 + rj. In this case, we would let h = (1 + r])/t* in the bandwidth selection algorithm. 

Theorem 3. Assume the following two natural assumptions: 

max \(i>{t + s) - + s)| = Op{l/y/n) 
se(o,i) 



uniformly in t, and 



I loff n 

max Wt + s) - (/>(t + s)| = Op 

sG(0,n) V 
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(i) Assume (j){t) ~ A\t\ for some positive constants A and d. Then 

p 

here A B means A/B ^ 1 in probability. 

(ii) Assume (p{t) ~ for some ^ G (0, 1) and A > 0. Then 

h ~ l/(ilogn). 

where A = — 1/log^. 

(Hi) Assume \(t){t)\ = when t > b. Then /i ~ 1/6. 

Theorem [3] shows how the proposed bandwidth selection algorithm adapts to the 
underlying degree of smoothness of the density matching nearly identically to the ideal 
bandwidths in Theorem [H When there is only polynomial decay of the characteristic 
function, as in part (i) of the above theorem, the bandwidth selection algorithm pro- 
duces a slightly smaller bandwidth than the theoretically optimal bandwidth given in 
Theorem [H but the discrepancy diminishes with faster decay. 



4 Bandwidth Selection for 2"^-Order Kernels 



We now propose a bandwidth selection procedure for use with second-order kernels, 
based on using the infinite-order estimators as pilots in the plug-in approach to band- 
width selection. Indeed, Theorem [1] points out the superiority of using the infinite-order 
kernels over second order kernels, but as a stepping stone to using infinite-order kernels 
directly in estimation, we now introduce the infinite-order kernels in pilot estimation. 
The result is a bandwidth selection procedure that converges very fast (comparable to 
the rates of Theorem [1]) . 

We begin with the MSE and mean integrated square error (MISE) of f{x) with a 
second order kernel A and standard assumptions on A 18. [2l|. 



MSE(/) = /i^ • ( r x^K{x) dx 



+ 



1 



2 



nh l-G{x) 



1 



A^(2;) dx 



+ - ■ fi^y 

n 



fir) 



1 - G{r, 



dr 



+ 0(/.«)+0(-^)+o(-i- 



(l-F(x))(l-G(x))J 
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MISE(/) = 

1 



+ 



+ 



n h 
1 



DO / Y f'^ 

f" {x)'^uj{x) dx I - x'^A{x) dx 

oo V ^ J ~oc 

•^^^^ uj{x)dx r A\x)dx 



n 



1 - G{x 

fir) 



1 - G(r; 



■ dr 



1 



{1-F{x)){l-Gix)) 



f{x)'^Lo{x) dx 



The MISE above has been generahzed shghtly to incorporate a nonnegative weight 
function a'(x) to control the influence of error in the tails the estimated density. If we 
minimize the above MSE with respect to /i, we arrive at the optimal bandwidth for 
estimating the density at a given point. And if we minimize the MISE with respect 
to /i, then we arrive at an optimal global bandwidth. The optimal bandwidths in 
each situation will involve values of the unknown underlying density that we wish 
to estimate, so we are forced to use some initial-or pilot-estimate of these values. 
Minimizing the above MSE and MISE values leads to the optimal bandwidths /imse 
and /iMiSE, respectively, given below. 



iMSE 



l-G(x) 



r^A\x)dx 



1/5 



n 



IMISE 



oo 
oo 1 



f"{x)jr^x^K{x)dx^ 
J^^u:{x)dxST^K\x)dx 



-1/5 



1/5 



f"{xYoj{x) dx x'^K{x) dx 



n 



-1/5 



In the above expression, we shall replace f{x) and f"{x) with infinite-order esti- 
mators f{x) and f2{x) respectively. The bandwidth used in estimating f2{x), and in 
general for fp{x), is the same bandwidth derived from the bandwidth selection algo- 
rithm above. The function 1 — G{x) is the survival function of the censored random 
variables, therefore by replacing Aj with 1 — Aj, the Kaplain-Meier estimator will give 
a "v/n-consistent estimate of 1 — G{x). Let husE and /imise refer to the plug-in es- 
timates corresponding to husE and /iMiSE respectively. These estimators have rapid 
convergence rates due to the ultra-fast convergence of the pilot flat-top estimators, and 
this is revealed in the following theorem. 

Theorem 4. Assume the conditions of Theorem O and assume conditions strong 
enough to ensure ([6]) holds for p = 2. Let Hm be either Hmse or Hmise with Km 
being the corresponding Hmse or Hmise- 

(i) Assume (j){t) ~ ^l^l"'^ for some positive constants A and d > 3. Then 



h 



M 



log n 



n 



2d 
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(n) Assume (j){t) ~ ^4^1*1 for some ^ G (0, 1) and A > 0. Then 



hM = /iM ^1 + Op 

(in) Assume |</>(t)| = when t>b, then 

flM = hM [ I + O. 



log 



n 



Marron and Padgett [10|] suggest cross-validation as a means of minimizing the 
integrated square error (ISE) , but this approach of minimizing ISE was shown in [J] to 
be less optimal than minimizing the MISE. In particular, the relative convergence rates 
(as in the above theorem) of the cross-validation approach in 10] is n~^/^^ , regardless 



of the degree of smoothness of f{x). If one uses the plug-in approach that we have 
adopted above but with pilots consisting of second-order kernels, then the relative 
convergence rates are at best n~^/^, again, regardless of the degree of smoothness of 
f{x). All of these rates are considerably smaller than the n~^/^ rate afforded by the 
proposed procedure under a sufficiently smooth density /(x) (i.e. when (/)(x) has a 
rapid decay to zero) as Theorem U] demonstrates. 



5 Simulations 

We constructed our infinite-order kernel from a "fiat-top" function k in ([3]) with any 
choice of continuous, square-integrable function g; an easy choice is g{x) = {1 + c — c\x\y 
which gives n a trapezoidal shape. Other possibilities for the function g are considered 
in [IB]. For the following simulations, we focus on a trapezoidal shape for k; we are still 
left to determine the parameter c controlling the slopes on the sides of the trapezoid. 
The parameter c = 4 seemed to work generally well and was used throughout all of 



the simulations, but there is certainly some flexibility in choice of c; see |l6l ] for further 
discussion on choosing the parameter c. 

In many situations, particularly involving censored data, the support is known to 
lie in a half-line or some compact interval, and unaltered versions of kernel density 
estimators are not even consistent near the boundary points. However there have been 
many fixes to this boundary issue (see B for a survey of several methods), and we 
adopt the simple reflection principle to resolve boundary problems in our estimator. 
Speciflcally, when the density is known to have its support on [O,cxo), we use the 

estimator fix) = f{x) + f{—x) to ensure consistency near the boundary point x = 0; 
see 19] and \2(j\ for discussions of this method in the uncensored iid context. 

We mimic the simulation presented in the introduction but with censoring. Inde- 
pendent and identically distributed lifetime variables are drawn from a AA(0, 1) dis- 
tribution and, independently, the censoring variables are drawn from the same dis- 
tribution. Therefore we would expect to see about 50% censoring on average. The 
challenger to our inflnite-order density estimator is ([1]) with Gaussian kernel K. The 
cross-validation criterion is suggested in [13] for selecting the bandwidth for the Gaus- 
sian kernel, but one of its drawbacks is the computational time required to compute it 
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which is greatly magnified over several thousand realizations. So instead of computing 
the cross-validations, we gave the Gaussian kernel a distinct advantage by choosing 
the bandwidth in which it performs the best (these were determined by finite-sample 
simulation). These optimal bandwidths are underscored next to their corresponding 
MSE values in the table below. For comparison, we have also included the MSE values 
for the infinite-order estimator with its best-choice bandwidth. 





X = 


= 


X = 


= 1 


X = 


= 2 


avg on 


[-2,2] 


n 


50 


500 


50 


500 


50 


500 


50 


500 


MSEinfinite 


6.40 


.642 


6.24 


.795 


2.76 


1.01 


4.60 


.622 


MSEinfinite ^ 


3.18,60 


.470.50 


1.851.00 


.139i,oo 


1.78.75 


.394,65 


2.92.65 


.425.55 


MSEcaussian ^ 


5.64,50 


1.15.30 


5.I81.00 


.448,65 


1.63.80 


.620,55 


5.46.65 


.779.40 



MSE values are blown up by 10^ for easier comparison. 



' Optimal bandwidths were used whose values are subscripted. 

Table 2: Comparison of the infinite-order kernel with the Gaussian kernel on censored data 
from lifetime and censoring variables that are iid A/'(0,1). 



Comparing the two kernels with their respective optimal bandwidths, the infinite- 
order estimator is clearly the better choice, and even when the bandwidth selection 
algorithm is used, the infinite-order estimator outperforms the Gaussian estimator 
with optimal bandwidth at the origin and on the interval [-2,2]. So given pretty much 
any bandwidth selection rule for the Gaussian kernel, the infinite-order estimator is 
bound to be more accurate over each criterion. 

The next simulation uses the same data, but this time we wish to estimate the 
hazard function. Our infinite-order estimate of the hazard function is f{x)/S{x) where 
f{x) is the usual infinite-order density estimator and S{x) is a smoothed Kaplan-Meier 

estimator (the R function ksmooth was applied to S to produce S{x)). The other 
two estimators are from the R packages muhaz and logspline. The muhaz estimator 
is based on the paper I4], and for this simulation the boundary correction is turned 
off and both global and local bandwidths are invoked (denoted muhaz-g and muhaz-1 
respectively). These estimators behave somewhat erratically with small A^, so we have 
changed the sample sizes to 100 and 1000, and we have limited the range of values to 
[-1,1]. The logspline estimator (based on 0]) uses splines to estimate the density, and 
the result is then divided by the smoothed Kaplan-Meier estimate to give an estimate 
of the hazard function. 
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n 


X 


— n 




— 1 




\ 1 11 


iUU 


1 nr\n 

iOUO 


iOU 


1000 


100 


1000 


MSEinfinitc 


.0177 


.00243 


.342 


.0334 


.0350 


.00469 


MSEjimhaz-g 


.0478 


.0137 


.979 


.261 


.106 


.0439 


MSEjnuhaz-l 


.0239 


.00293 


.407 


.0718 


.0557 


.00750 


MSEiogspiijic 


.0174 


.00284 


.204 


.119 


.0354 


.0200 



Table 3: Comparison of the infinite-order kernel estimator of the 
hazard function with the muhaz estimator (with global and local 
bandwidth selection) and logspline estimator. The lifetime and cen- 
soring variables are both iid A/'(0,1). 



With a large enough sample size, the infinite-order estimator is expected to out- 
perform its competitors; this is witnessed in the above simulation as the infinite-order 
has the smahest MSE in each category at n = 1000 with 50% censoring. 

The previous simulation may be considered more of a theoretical comparison since 
in most applications the censored data is nonnegative. Therefore in the next simulation 
we use lifetime and censored variables drawn from a lognormal distribution with means 
and .5 and standard deviations .5 and .5 respectively (values are given on the log 
scale). Again, due to limitations of the other estimators, we consider datasets of size 
100 and 1000, and we only consider the estimates on the interval [0,1.5]. Although the 
lifetime distribution has support on the positive reals, its density takes the value at 
the origin, so a boundary correction is not necessary for this simulation. 




0.0 0.5 1,0 1,5 2.0 0.0 0.5 1,0 1.5 



(a) (b) 

Figure 1: (a) Plot of the lifetime and censored lognormal densities with the survival function, 
(b) Plot of the hazard function and one realization of the three estimators with A^=100. 
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x = 



X = .75 



X = 1.5 



avg on [0,1.5] 



n 


100 


1000 


100 


1000 


100 


1000 


100 


1000 


MSEinfinitc 


.000280 


2.26e-06 


.0527 


.00555 


.173 


.0115 


.0502 


.0115 




.00462 


.000147 


.0445 


.0111 


.429 


.0998 


.0883 


.0203 


MSEjnuhaz-l 


.118 


.0320 


.0478 


.0206 


.1451 


.0585 


.0809 


.0239 


MSEiogspiinc 


.000169 


4.51e-06 


.0757 


.00873 


.134 


.0303 


.0661 


.0123 



Table 4: Comparison of the infinite-order kernel estimator of the hazard function with the 
muhaz estimator and logspline estimator on lognormal data. 



Once again, with the larger data size, the infinite-order estimator in this example 
outperforms the other estimators in terms of MSE performance. The muhaz estimator 
is particularly suited for dealing with boundary effects and adapting its bandwidth ap- 
propriately as is shown in the next simulation. In the following simulation, the lifetime 
variables have an exponential distribution with mean one and the censoring variables 
have an exponential distribution with mean four. Therefore the hazard function in this 
model is constant with value one. 



n 


X 


= 


X = 


.75 


X = 


1.5 


avg on 


[0,1.5] 


100 


1000 


100 


1000 


100 


1000 


100 


1000 


MSEinfinite 


.0318 


.00419 


.0216 


.00361 


.0676 


.00829 


.0316 


.00769 


MSEumhaz-g 


.0425 


.00514 


.0430 


.00425 


.419 


.0125 


.0835 


.00560 


MSEjnuhaz-1 


.0356 


.00299 


.0251 


.00176 


.209 


.00382 


.0562 


.00235 


MSEiogspline 


.518 


.503 


.0319 


.00427 


.0436 


.00610 


.0648 


.0218 



Table 5: Comparison of the infinite-order kernel estimator of the hazard function with 
the muhaz estimator and logspline estimator on exponential data. 



Here we see the infinite-order estimator doing best with the smaller sample size, 
and it improves with n, but its performance is not as good as the muhaz estimator with 
local bandwidth selection when n = 1000. We expected this behavior, and we describe 
two reasons that account for asymptotic performance of the infinite-order estimator. 
The first reason follows directly from Theorem [1] — since the characteristic function of 
the exponential distribution behaves like \4>{t)\ ~ 1/4, Theorem [1] indicates there is no 
benefit to using the infinite-order kernel. In the previous examples, the normal and 
lognormal distributions have characteristic functions that behave like \4>{t)\ ~ e^* and 
|i;^>(t)| ~ l/t'°s* respectively^, and in both cases Theorem [T] implies a significant MSE 
improvement with large n. The second reason is due to a lack of a local bandwidth 
selection procedure for the infinite-order estimator. Simulations computed the optimal 

Refer to 0] for the derivation of the characteristic function a lognormal distribution 
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bandwidths in each of the four scenarios in Table 5 for the infinite-order estimator, and 
the optimal bandwidths were found to be .1, 1.1, .5, and 1.5 which vary widely (compare 
with optimal bandwidths in Table 2). Therefore a localized bandwidth procedure would 
be particularly ideal in this situation. 



The proposed infinite-order estimator together with its tailored bandwidth selection 
algorithm produce a nearly -^/n-convergent nonparametric estimator in many stan- 
dard situations. Even in the least ideal situation of a slow decay of the characteristic 
function to zero (i.e., when the pdf is not very smooth), the estimator still holds up 
and can outperform existing methods in many situations. One of the nicest qualities 
of this estimator is its simplicity — we used the exact same kernel throughout all of 
the simulations, so no parameter estimation was involved in choosing the kernel, and 
the accompanying bandwidth selection algorithm requires very little computation to 
implement. Finally, the proposed estimator is very robust, and since no parameter 
estimation is involved, it succeeds in estimating the hazard function and density in 
small sample sizes where competing estimators like muhaz and logspline fail to even 
produce an estimate. 



6 Conclusions 



A Technical Proofs 



Proof of Theorem [TJ 



Proof. Using the identity 




(8) 



and the sample characteristic function given by 




n 



,itXj 






(9) 
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Since = 1, we have Ei^(t) = and from the representation in Q, the 

expectation of /(x) is 



E[/(x)] = — / 0(t)K(t/i)e**- dt 



Since (j){t) is the inverse Fourier transform of /(x), /(x) is therefore the Fourier trans- 
form of (/'(i); i.e. 

1 



/(x) = ^y 0(i)e-'*"di. (10) 



Therefore the bias of /(x) is 



oo 



bias(/(x)) = E[/(x)] - f{x) = ^ / {K{th) - l)(t>{t)e-''^ dt. 



oo 



But since K,[th) = 1 for \t\ < 1/h, we have 

bias(/(x)) = 7^ / {K{th) - l)(^(t)e-**^ dt. 
27r J|t|>i/h 

Since < 1 for all t, \K{th) — 1| < 2 for all h and t. We can then bound the bias by 

|bias(/(x))| < A /■ \^(^t)\dt. 

J|t|>i//i 

Under the assumption / dt < oo in (i), we have 



\cl){t)\dt= [ ^^dt 

\t\>i/h J\t\>i/h \ty 



--< /i" / \t\'-\(t>{t)\dt 
i\t\>i/h 



= o{h^). 

If the bias is o{h'^) and the variance is O (^), then we wish to choose h such that 
^2'' - ^ which occurs if /i ~ an~^ with /3 = (2r + With this choice of h, we have 



nh 



sup 



bias I /(x) I =o(^n'2r+i^ and MSE |/(x)| = O (^nzr+i ^ 



This proves part (i). 

Under the assumption \(p{t)\ < De"*^'*' for some positive constants d and D, we 
have 



/ 

J\t 



\^it)\dt <d[ e-'^\*\dt 
\t\>i/h J\t\>i/h 



d/h 



\t\>i/h 
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So the bias is 0(e '^Z'*), and by letting h ~ l/(alogn) gives a squared-bias of 



O e—] = e 



-2da log n 



On 



-2da 



and a variance of 



nh I \ n 



Therefore if a > l/(2d), then 



sup 



bias 



{/»} 



O ( i I and MSe|/(x)} = O 



This proves part (ii). 

Under the assumption (f)[t) = when \t\ > b, we have 



\(l){t)\dt = 



\t\>i/h 

when h < 1/6. So by letting h < 1/6, we have 



sup 



bias 



{/»} 



and MSE 



O - 



which completes the proof of the theorem. 



□ 



Proof. Proof of Theorem [2l By taking the p^^ derivative on both sides of the 
identity we have 



1 /TN 1 f°° 



By taking the p^^ derivative on both sides of the identity (llOp . we have 



1 roo 



27r 



{-it)P<P{t)K{th)e-'''' dt. 



Following the steps in ([9]), we have 



fpi^) = ^ / ^ii)^(if')^"'' dt. 



and we can now compute the bias of fp{x) to be 



bias(/p(x)) = ^ 



-itf{K{th) - l)(/.(t)e-*'^' dt. 



Proceeding as in the proof of Theorem [H this bias is bounded as 



bias(/(x))i < — / mm\dt. 

J\t\>l/h 
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Under assumption A{r + p), we have 



/ mm\dt= [ ^^J^dt 

j\t\>i/h J\t\>i/h m 



< / \tY+p\(^{t)\dt 
J\t\>i/h 

= o{h^). 

If the bias is o{h'^) and the variance is O ( „^p+i ) ; then we wish to choose h such that 
h"^^ ~ ^f^+i which occurs if /i ~ an~^ with j3 = (2r + p + With this choice of h, 

we have 



sup 



bias = o ^n^r+p+i'^ and MSE = O fn2'-+p+i ) . 



Under assumption B, 

[ \t\P\(l){t)\dt <d[ |t|Pe-'^l*l dt 

J\t\>l/h J\t\>l/h 



s"^/^' -i\t\>i/h 



Under assumption C, 

[ \t\P\(l){t)\dt = 
J\t\>i/h 

when h < 1/6. Finally, the bias and MSE results for parts (ii) and (iii) now follow 
along the same lines as Theorem [TJ □ 

Proof of Theorem [31 

Proof. The proof is very similar to the proof of Theorem[3]in [ij with little modification. 

□ 

Proof of Theorem [H 

Proof. Parts (ii) and (iii) follow from Theorems 1 and 2 and the ^-method. The 
convergence of /im in part (i) is dictated by the slowly converging f"{x). However, the 
convergence rate of /im is unhampered by the convergence rate of h; for instance, if h 
is replaced with the random quantity h{l + Op(l)) (refer to the proof of Lemma 2 in 
[3]) then Theorem[T]is still valid. If (f){t) ~ ^Itl"*^, then by Theorem[3l 

1 

I P 7 /^lognN 2d 

From Theorem [21 part (i), if 



n 



oo 

tr+^\m<^, (11) 
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then the bias of f"{x) is o{h^). In order for (jlip to be satisfied, r must be less than 
d — 3, so we let r = [d — 4] . Therefore the bias of f"{x) (which dominates the MSE of 
T'{x)) is 

log n\ 2d 



n 

and coupled with the (5-method, part (i) of Theorem [1] is now proved. □ 
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